Enabling Re-executions of Parallel Scientific Workflows Using Runtime Provenance Data

نویسندگان

  • Flavio Costa
  • Daniel de Oliveira
  • Kary A. C. S. Ocaña
  • Eduardo S. Ogasawara
  • Marta Mattoso
چکیده

Capturing provenance data in scientific workflows is a key issue since it allows for reproducibility and evaluation of results. Many of these workflows generate around 100,000 tasks that execute in parallel in High Performance Computing environments, such as large clusters and clouds. SciCumulus is a workflow engine for parallel execution in clouds. Activity failure is almost inevitable in clouds where virtual machine failures are a reality rather than a possibility. We present SciMultaneous, a service architecture that manages re-executions of failed scientific workflow tasks using runtime provenance. Experimental results on clouds showed that SciMultaneous considerably increases the workflow completion and reduces the total execution time of the workflow (considering executions and re-executions) up to 11.5%, when compared to ad-hoc approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Domain-Data Steering with Code-Profiling Tools to Debug Data-Intensive Workflows

Computer simulations may be composed of scientific programs chained in a coherent flow and executed in High Performance Computing environments. These executions may present anomalies associated to the data that flows in parallel among programs. Several parallel code-profiling tools already support performance analysis, such as Tuning and Analysis Utilities (TAU) or provide fine-grained performa...

متن کامل

Using Domain-Specific Data to Enhance Scientific Workflow Steering Queries

In scientific workflows, provenance data helps scientists in understanding, evaluating and reproducing their results. Provenance data generated at runtime can also support workflow steering mechanisms. Steering facilities for workflows is considered a challenge due to its dynamic demands during execution. To steer, for example, scientists should be able to suspend (or stop) a workflow execution...

متن کامل

Distilling structure in scientific workflows

Motivation and Objectives Scientific workflows management systems, (e.g., (Missier et al., 2010; Ludaesher et al., 2006; Goeck et al. 2011)) are increasingly used to specify and manage bioinformatics experiments. An experiment is then represented by a workflow in which a large number of bioinformatics tasks are linked to each other. A workflow specification is a framework for the execution of w...

متن کامل

Understanding Collaborative Studies through Interoperable Workflow Provenance

The provenance of a data product contains information about how the product was derived, and is crucial for enabling scientists to easily understand, reproduce, and verify scientific results. Currently, most provenance models are designed to capture the provenance related to a single run, and mostly executed by a single user. However, a scientific discovery is often the result of methodical exe...

متن کامل

Performance Evaluation of the Karma Provenance Framework for Scientific Workflows

Provenance about workflow executions and data derivations in scientific applications help estimate data quality, track resources, and validate in silico experiments. The Karma provenance framework provides a means to collect workflow, process, and data provenance from data-driven scientific workflows and is used in the Linked Environments for Atmospheric Discovery (LEAD) project. This paper pre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012